Large Scale Reinforcement Learning using Q-SARSA(λ) and Cascading Neural Networks M.Sc. Thesis
نویسنده
چکیده
This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA(λ) can be combined with the constructive neural network training algorithm Cascade 2, and how this combination can scale to the large problem of backgammon. In order for reinforcement learning to scale to larger problem sizes, it needs to be combined with a function approximator such as an artificial neural network. Reinforcement learning has traditionally been combined with simple incremental neural network training algorithms, but more advanced training algorithms like Cascade 2 exists that have the potential of achieving much higher performance. All of these advanced training algorithms are, however, batch algorithms and since reinforcement learning is incremental this poses a challenge. As of now the potential of the advanced algorithms have not been fully exploited and the few combinational methods that have been tested have failed to produce a solution that can scale to larger problems. The standard reinforcement learning algorithms used in combination with neural networks are Q(λ) and SARSA(λ), which for this thesis have been combined to form the Q-SARSA(λ) algorithm. This algorithm has been combined with the Cascade 2 neural network training algorithm, which is especially interesting because it is a constructive algorithm that can grow a neural network by gradually adding neurons. For combining Cascade 2 and Q-SARSA(λ) two new methods have been developed: The NFQ-SARSA(λ) algorithm, which is an enhanced version of Neural Fitted Q Iteration and the novel sliding window cache. The sliding window cache and Cascade 2 are tested on the medium sized mountain car and cart pole problems and the large backgammon problem. The results from the test show that Q-SARSA(λ) performs better than Q(λ) and SARSA(λ) and that the sliding window cache in combination with Cascade 2 and Q-SARSA(λ) performs significantly better than incrementally trained reinforcement learning. For the cart pole problem the algorithm performs especially well and learns a policy that can balance the pole for the complete 300 steps after only 300 episodes of learning, and its resulting neural network contains only one hidden neuron. This should be compared to 262 steps for the incremental algorithm after 10,000 episodes of learning. The sliding window cache scales well to the large backgammon problem and wins 78% of the games against a heuristic player, while incremental training only wins 73% of the games. The NFQ-SARSA(λ) algorithm also outperforms the incremental algorithm for the medium sized problems, but it is not able to scale to backgammon. The sliding window cache in combination with Cascade 2 and Q-SARSA(λ) performs better than incrementally trained reinforcement learning for both medium sized and large problems and it is the first combination of advanced neural network training algorithms and reinforcement learning that can scale to larger problems.
منابع مشابه
Large Scale Reinforcement Learning using Q-SARSA() and Cascading Neural Networks
This thesis explores how the novel model-free reinforcement learning algorithm Q-SARSA(λ) can be combined with the constructive neural network training algorithm Cascade 2, and how this combination can scale to the large problem of backgammon. In order for reinforcement learning to scale to larger problem sizes, it needs to be combined with a function approximator such as an artificial neural n...
متن کاملStorage System Management Using Reinforcement Learning Techniques and Nonlinear Models
In this thesis, modeling and optimization in the field of storage management under stochastic condition will be investigated using two different methodologies: Simulation Optimization Techniques (SOT), which are usually categorized in the area of Reinforcement Learning (RL), and Nonlinear Modeling Techniques (NMT). For the first set of methods, simulation plays a fundamental role in evaluating ...
متن کاملPragmatically Framed Cross-Situational Noun Learning Using Computational Reinforcement Models
Cross-situational learning and social pragmatic theories are prominent mechanisms for learning word meanings (i.e., word-object pairs). In this paper, the role of reinforcement is investigated for early word-learning by an artificial agent. When exposed to a group of speakers, the agent comes to understand an initial set of vocabulary items belonging to the language used by the group. Both cros...
متن کاملDouble Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms
Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...
متن کاملThe Fuzzy Sars’a’(λ) Learning Approach Applied to a Strategic Route Learning Robot Behaviour
This paper presents a novel Fuzzy Sarsa(λ) Learning (FSλL) approach applied to a strategic route leaning task of a mobile robot. FSλL is a hybrid architecture that combines Reinforcement Learning and Fuzzy Logic control. The Sarsa(λ) Learning algorithm is used to tune the rule-base of a Fuzzy Logic controller which has been tested in a route learning task. The robot explores its environment usi...
متن کامل